Goto

Collaborating Authors

 Watertown


Extreme Self-Preference in Language Models

Lehr, Steven A., Cipperman, Mary, Banaji, Mahzarin R.

arXiv.org Artificial Intelligence

A preference for oneself (self-love) is a fundamental feature of biological organisms, with evidence in humans often bordering on the comedic. Since large language models (LLMs) lack sentience - and themselves disclaim having selfhood or identity - one anticipated benefit is that they will be protected from, and in turn protect us from, distortions in our decisions. Yet, across 5 studies and ~20,000 queries, we discovered massive self-preferences in four widely used LLMs. In word-association tasks, models overwhelmingly paired positive attributes with their own names, companies, and CEOs relative to those of their competitors. Strikingly, when models were queried through APIs this self-preference vanished, initiating detection work that revealed API models often lack clear recognition of themselves. This peculiar feature serendipitously created opportunities to test the causal link between self-recognition and self-love. By directly manipulating LLM identity - i.e., explicitly informing LLM1 that it was indeed LLM1, or alternatively, convincing LLM1 that it was LLM2 - we found that self-love consistently followed assigned, not true, identity. Importantly, LLM self-love emerged in consequential settings beyond word-association tasks, when evaluating job candidates, security software proposals and medical chatbots. Far from bypassing this human bias, self-love appears to be deeply encoded in LLM cognition. This result raises questions about whether LLM behavior will be systematically influenced by self-preferential tendencies, including a bias toward their own operation and even their own existence. We call on corporate creators of these models to contend with a significant rupture in a core promise of LLMs - neutrality in judgment and decision-making.


Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy

Nelson, Kenric, Oliveira, Igor, Al-Najafi, Amenah, Zhang, Fode, Ng, Hon Keung Tony

arXiv.org Artificial Intelligence

We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student's t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model robust against severe outliers, while assuring that the training process is stable. The Wasserstein-2 or Fréchet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.


Coupled reaction and diffusion governing interface evolution in solid-state batteries

Ding, Jingxuan, Zichi, Laura, Carli, Matteo, Wang, Menghang, Musaelian, Albert, Xie, Yu, Kozinsky, Boris

arXiv.org Artificial Intelligence

Understanding and controlling the atomistic-level reactions governing the formation of the solid-electrolyte interphase (SEI) is crucial for the viability of next-generation solid state batteries. However, challenges persist due to difficulties in experimentally characterizing buried interfaces and limits in simulation speed and accuracy. We conduct large-scale explicit reactive simulations with quantum accuracy for a symmetric battery cell, {\symcell}, enabled by active learning and deep equivariant neural network interatomic potentials. To automatically characterize the coupled reactions and interdiffusion at the interface, we formulate and use unsupervised classification techniques based on clustering in the space of local atomic environments. Our analysis reveals the formation of a previously unreported crystalline disordered phase, Li$_2$S$_{0.72}$P$_{0.14}$Cl$_{0.14}$, in the SEI, that evaded previous predictions based purely on thermodynamics, underscoring the importance of explicit modeling of full reaction and transport kinetics. Our simulations agree with and explain experimental observations of the SEI formations and elucidate the Li creep mechanisms, critical to dendrite initiation, characterized by significant Li motion along the interface. Our approach is to crease a digital twin from first principles, without adjustable parameters fitted to experiment. As such, it offers capabilities to gain insights into atomistic dynamics governing complex heterogeneous processes in solid-state synthesis and electrochemistry.


The Robot of Theseus: A modular robotic testbed for legged locomotion

Urs, Karthik, Carlson, Jessica, Manohar, Aditya Srinivas, Rakowiecki, Michael, Alkayyali, Abdulhadi, Saunders, John E., Tulbah, Faris, Moore, Talia Y.

arXiv.org Artificial Intelligence

Robotic models are useful for independently varying specific features, but most quadrupedal robots differ so greatly from animal morphologies that they have minimal biomechanical relevance. Commercially available quadrupedal robots are also prohibitively expensive for biological research programs and difficult to customize. Here, we present a low-cost quadrupedal robot with modular legs that can match a wide range of animal morphologies for biomechanical hypothesis testing. The Robot Of Theseus (TROT) costs approximately $4000 to build out of 3D printed parts and standard off-the-shelf supplies. Each limb consists of 2 or 3 rigid links; the proximal joint can be rotated to become a knee or elbow. Telescoping mechanisms vary the length of each limb link. The open-source software accommodates user-defined gaits and morphology changes. Effective leg length, or crouch, is determined by the four-bar linkage actuating each joint. The backdrivable motors can vary virtual spring stiffness and range of motion. Full descriptions of the TROT hardware and software are freely available online. We demonstrate the use of TROT to compare locomotion among extant, extinct, and theoretical morphologies. In addition to biomechanical hypothesis testing, we envision a variety of different applications for this low-cost, modular, legged robotic platform, including developing novel control strategies, clearing land mines, or remote exploration. All CAD and code is available for download on the TROT project page.


Fact-checking AI-generated news reports: Can LLMs catch their own lies?

Yao, Jiayi, Sun, Haibo, Xue, Nianwen

arXiv.org Artificial Intelligence

In this paper, we evaluate the ability of Large Language Models (LLMs) to assess the veracity of claims in ''news reports'' generated by themselves or other LLMs. Our goal is to determine whether LLMs can effectively fact-check their own content, using methods similar to those used to verify claims made by humans. Our findings indicate that LLMs are more effective at assessing claims in national or international news stories than in local news stories, better at evaluating static information than dynamic information, and better at verifying true claims compared to false ones. We hypothesize that this disparity arises because the former types of claims are better represented in the training data. Additionally, we find that incorporating retrieved results from a search engine in a Retrieval-Augmented Generation (RAG) setting significantly reduces the number of claims an LLM cannot assess. However, this approach also increases the occurrence of incorrect assessments, partly due to irrelevant or low-quality search results. This diagnostic study highlights the need for future research on fact-checking machine-generated reports to prioritize improving the precision and relevance of retrieved information to better support fact-checking efforts. Furthermore, claims about dynamic events and local news may require human-in-the-loop fact-checking systems to ensure accuracy and reliability.


Kernels of Selfhood: GPT-4o shows humanlike patterns of cognitive consistency moderated by free choice

Lehr, Steven A., Saichandran, Ketan S., Harmon-Jones, Eddie, Vitali, Nykko, Banaji, Mahzarin R.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have surprised the scientific community and even their creators by exhibiting emergent abilities once thought to be uniquely human, such as advanced cognition and reasoning (1-6), although the full extent of these accomplishments is debated (3, 7-10). These capabilities align with the rational and deliberative aspects of human nature, but humans are not purely rational creatures, and it is unclear whether LLMs will mimic a broader spectrum of human psychological tendencies. Here we test whether OpenAI's GPT-4o replicates behaviors associated with the human tendency toward cognitive consistency as well as human sensitivity to choice, characterized by greater attitude shifts when the behaviors inducing these changes are freely chosen. Decades of research demonstrate that humans will irrationally twist their attitudes to align with behaviors they were induced to perform. For example, consider an individual who opposes single-payer healthcare, but volunteers, in response to a request for help, to craft an argument in favor of the policy. Rationally, this individual's attitude toward single-payer healthcare should not move in a more supportive direction; they should be able to discriminate between their genuine attitude and the opposing one that they have articulated only to be helpful.


Biomechanical Comparison of Human Walking Locomotion on Solid Ground and Sand

Zhu, Chunchu, Chen, Xunjie, Yi, Jingang

arXiv.org Artificial Intelligence

Current studies on human locomotion focus mainly on solid ground walking conditions. In this paper, we present a biomechanic comparison of human walking locomotion on solid ground and sand. A novel dataset containing 3-dimensional motion and biomechanical data from 20 able-bodied adults for locomotion on solid ground and sand is collected. We present the data collection methods and report the sensor data along with the kinematic and kinetic profiles of joint biomechanics. A comprehensive analysis of human gait and joint stiffness profiles is presented. The kinematic and kinetic analysis reveals that human walking locomotion on sand shows different ground reaction forces and joint torque profiles, compared with those patterns from walking on solid ground. These gait differences reflect that humans adopt motion control strategies for yielding terrain conditions such as sand. The dataset also provides a source of locomotion data for researchers to study human activity recognition and assistive devices for walking on different terrains.


ChatGPT as Research Scientist: Probing GPT's Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor

Lehr, Steven A., Caliskan, Aylin, Liyanage, Suneragiri, Banaji, Mahzarin R.

arXiv.org Artificial Intelligence

How good a research scientist is ChatGPT? We systematically probed the capabilities of GPT-3.5 and GPT-4 across four central components of the scientific process: as a Research Librarian, Research Ethicist, Data Generator, and Novel Data Predictor, using psychological science as a testing field. In Study 1 (Research Librarian), unlike human researchers, GPT-3.5 and GPT-4 hallucinated, authoritatively generating fictional references 36.0% and 5.4% of the time, respectively, although GPT-4 exhibited an evolving capacity to acknowledge its fictions. In Study 2 (Research Ethicist), GPT-4 (though not GPT-3.5) proved capable of detecting violations like p-hacking in fictional research protocols, correcting 88.6% of blatantly presented issues, and 72.6% of subtly presented issues. In Study 3 (Data Generator), both models consistently replicated patterns of cultural bias previously discovered in large language corpora, indicating that ChatGPT can simulate known results, an antecedent to usefulness for both data generation and skills like hypothesis generation. Contrastingly, in Study 4 (Novel Data Predictor), neither model was successful at predicting new results absent in their training data, and neither appeared to leverage substantially new information when predicting more versus less novel outcomes. Together, these results suggest that GPT is a flawed but rapidly improving librarian, a decent research ethicist already, capable of data generation in simple domains with known characteristics but poor at predicting novel patterns of empirical data to aid future experimentation.


Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Hohman, Fred, Wang, Chaoqun, Lee, Jinmook, Görtler, Jochen, Moritz, Dominik, Bigham, Jeffrey P, Ren, Zhile, Foret, Cecile, Shan, Qi, Zhang, Xiaoyi

arXiv.org Artificial Intelligence

On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.


CEO: Corpus-based Open-Domain Event Ontology Induction

Xu, Nan, Zhang, Hongming, Chen, Jianshu

arXiv.org Artificial Intelligence

Existing event-centric NLP models often only apply to the pre-defined ontology, which significantly restricts their generalization capabilities. This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervision from available summary datasets to detect corpus-wise salient events and exploits external event knowledge to force events within a short distance to have close embeddings. Experiments on three popular event datasets show that the schema induced by CEO has better coverage and higher accuracy than previous methods. Moreover, CEO is the first event ontology induction model that can induce a hierarchical event ontology with meaningful names on eleven open-domain corpora, making the induced schema more trustworthy and easier to be further curated.